BMJ Health & Care Informatics
● BMJ
Preprints posted in the last 90 days, ranked by how well they match BMJ Health & Care Informatics's content profile, based on 13 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Uzochukwu, B. S. C.; Cherima, Y. J.; Enebeli, U. U.; Okeke, C. C.; Uzochukwu, A. C.; Omoha, A.; Hassan, B.; Eronu, E. M.; Yusuf, S. M.; Uzochukwu, K. A.; Kalu, E. I.
Show abstract
Background: The integration of artificial intelligence (AI) into clinical practice holds transformative potential for healthcare in West Africa, but safe deployment requires context-appropriate governance, accountability, and post-deployment monitoring frameworks. This cross-sectional mixed-methods study examined preferences and concerns of West African clinicians and technical experts regarding AI governance structures, post-deployment surveillance mechanisms, and accountability allocation. Methods: A structured questionnaire was administered to 136 physicians affiliated with the West African College of Physicians (February 22-28, 2026), complemented by 72 key informant interviews with technical leads, AI developers, data scientists, policymakers, and healthcare leaders. Data were analyzed using descriptive statistics, inferential tests, and thematic analysis. Results: Clinicians strongly preferred independent regulatory bodies (40.4%) for overseeing AI tool performance, with high trust ratings (mean:4.3/5), while vendor self-monitoring received minimal support (3.7%, mean:2.4/5). Real-time dashboards were the most favored monitoring approach (41.9%). Clear accountability pathways (94.1%), algorithm transparency (91.9%), and real-time performance data (89.7%) were rated essential by majorities. Major concerns included clinicians being unfairly blamed for AI errors (76.5%), excessive vendor control (72.8%), and absence of clear reporting pathways (69.9%). Qualitative findings emphasized continuous performance tracking for accuracy, fairness, and bias; structured incident reporting; protocols for model drift and failure; and multi-layered governance combining independent oversight, institutional AI committees, and explicit liability frameworks. Conclusion: This study provides the first empirical evidence from West Africa on clinician preferences for AI governance. Findings offer actionable guidance for policymakers to build trustworthy, equitable, and safe AI integration frameworks that prioritize transparency, independent oversight, and clinician protection. Keywords: artificial intelligence; AI governance; post-deployment monitoring; accountability; West Africa; clinician preferences; health data science.
Edara, R.; Khare, A.; Atreja, A.; Awasthi, R.; Highum, B.; Hakimzadeh, N.; Ramachandran, S. P.; Mishra, S.; Mahapatra, D.; Shree, S.; Bhattacharyya, A.; Singh, N.; Reddy, S.; Cywinski, J. B.; Khanna, A. K.; Maheshwari, K.; Papay, F. A.; Mathur, P.
Show abstract
BackgroundBreakthroughs in model architecture and the availability of data are driving transformational artificial intelligence in healthcare research at an exponential rate. The shift in use of model types can be attributed to multimodal properties of the Foundation Models, better reflecting the inherently diverse nature of clinical data and the advancing model implementation capabilities. Overall, the field is maturing from exploratory development towards application in real-world evaluation and implementation, spanning both Generative and predictive AI. MethodsDatabase search in PubMed was performed using the terms "machine learning" or "artificial intelligence" and "2025", with the search restricted to English-language human-subject research. A BERT-based deep learning classifier, pre-trained and validated on manually labeled data, assessed publication maturity. Five reviewers then manually annotated publications for healthcare specialty, data type, and model type. Systematic reviews, duplicates, pre-prints, robotic surgery studies, and non-human research publications were excluded. Publications employing foundation models were further analyzed for their areas of application and use cases. ResultsThe PubMed search yielded 49,394 publications, a near-doubling from 28,180 in 2024, of which 3,366 were classified as mature. 2,966 were included in the final analysis after exclusions, compared to 1946 in 2024. Imaging remained the dominant specialty (976 publications), followed by Administrative (277) and General (251). Traditional text-based LLMs (1,019) led model usage, but Multimodal Foundation Models surged from 25 publications in 2024 to 144 in 2025, and Deep Learning models also increased substantially (910). For the first time, publications related to classical Machine Learning model use declined (173) in our annual review. Image remained the predominant data type (53.9%), followed by text (38.2%), with a notable increase in audio (1.2%) coinciding with the adoption of multimodal models. Across foundation model publications, Imaging (110), Head and Neck (92), Surgery (64), Oncology (55), and Ophthalmology (49) were leading specialties, while Administrative and Education categories remained high-volume contributors driven predominantly by LLM-based research. Conclusion2025 signals a meaningful maturation of the healthcare AI research field, with publication volumes nearly doubling, classical ML yielding to higher-capacity foundation models, and the field rapidly moving beyond traditional text-based LLM capabilities toward multimodal models. While Imaging continues to lead in research output, the growth of multimodal models across clinical specialties suggests the field is approaching an inflection point where AI systems can more closely mirror the complexity of real-world clinical practice.
Adekunle, T.; Ohaeche, J.; Adekunle, T.; Adekunle, D.; Kogbe, M.
Show abstract
BackgroundArtificial intelligence is increasingly embedded in healthcare delivery. Its legitimacy depends on institutional governance, not technical performance alone. Prior research has centered on clinicians and patients. Less attention has been given to cybersecurity professionals who sustain the digital infrastructures that support health AI. This study examines how cybersecurity professionals conceptualize AI as clinical infrastructure and how these interpretations shape understandings of trust, risk, and oversight. MethodsGuided by sociotechnical systems theory and institutional trust scholarship, we conducted semi-structured in-depth interviews with twenty cybersecurity professionals working in healthcare-relevant domains. Participants were recruited through professional networks and LinkedIn outreach. Interviews were conducted between May and August 2025. They were audio-recorded and transcribed verbatim. Data were analyzed using qualitative content analysis with constant comparison. Two researchers independently coded transcripts and refined themes through iterative discussion. The study received Institutional Review Board approval. ResultsParticipants described health AI as an augmented clinical infrastructure. They emphasized that AI extends workflow capacity but requires sustained human oversight. Healthcare data systems were characterized as fragmented and vulnerable. Breaches were treated as anticipated events. Trust in AI was described as contingent and built over time through visible accountability. Cybersecurity stewardship was framed as foundational to institutional trustworthiness. ConclusionsHealth AI credibility emerges through governance practices that demonstrate accountability. Cybersecurity professionals and institutional stakeholders jointly shape trust in digitally mediated healthcare systems through governance decisions that signal accountability.
Galfano, A.; Barbosu, C. M.; Aladin, B.; Rivera, I.; Dye, T. D. V.
Show abstract
Artificial intelligence (AI) is dramatically changing the healthcare landscape by providing patients, clinicians, administrators, and public health professionals with tools aiming to improve efficiency, outcomes, and experience in health. As elsewhere, New York State (NYS) experiences high demand for - and high investment in - transformation in healthcare with AI tools, though little is known about clinicians use and interest in adopting AI tools in their work. A large share of the nations future primary care clinicians train and work in NYS, and the states ability to establish clear policies, provide tools, and elevate AI competency have implications for care delivery nationally. As a result, we undertook this analysis of NYS clinicians use of AI to better understand opportunities for its adoption and inclusion in continuing education. For this analysis, we included healthcare providers who deliver ambulatory or specialty medical care within NYS, with use/frequency/purpose of AI tools by clinicians in their work as the main outcome. Of 305 NYS clinical providers responding, 23.4% indicated they use AI tools for work, and 11.1% report monthly use, 8.5% weekly use, and 4.6% daily use. AI was primarily used to search guidelines and ask clinical questions, followed by identifying drug interactions, analyzing data, analyzing images/labs, and creating care plans and patient recommendations. AI use did not vary significantly across professional disciplines or practice types, though independent practitioners were significantly more likely than advanced practice providers to use AI in their work, as were providers using social media and digital methods for obtaining continuing education. AI use increased substantially in 2025 compared with 2024. Overall, our findings suggest that programs targeting clinicians could incorporate these findings in designing accessible and acceptable AI-related continuing education opportunities to help familiarize clinicians with opportunities and risks for integrating AI tools into their practices. Author SummaryAI tools are rapidly gaining traction in the delivery of healthcare. We found that clinician use of AI was quite limited (23%), though growing. Those using AI tools used them sparingly in their work, with only about 5% reporting daily use. The purposes for which clinicians report using AI - asking clinical questions, interpreting patient results, creating patient educational materials - could contribute substantially to healthcare outcomes if widely adopted. Designers of continuing education for clinicians should help provide opportunities for clinicians to improve their familiarity, use, and competency with AI tools, to help maximize the potential health benefits possible for patients and communities.
Yip, A.; Craig, G.; White, N. M.; Cortes-Ramirez, J.; Shaw, K.; Reddy, S.
Show abstract
PurposeTo evaluate whether large language models (LLMs) can enhance clinician-patient communication by simplifying radiology reports to improve patient readability and comprehension. MethodsA randomised controlled trial was conducted at a single healthcare service for patients undergoing X-ray, ultrasound or computed tomography between May 2025 and June 2025. Participants were randomised in a 1:1 ratio to receive either (1) the formal radiology report only or (2) the formal radiology report and an LLM-simplified version. Readability scores, including the Simple Measure of Gobbledygook, Automated Readability Index, Flesch Reading Ease, and Flesch-Kincaid grade level, were calculated for both reports. Statistical analysis of patient readability and comprehension levels, factual accuracy and hallucination rates for LLMs was assessed using a combination of binary and 5-point Likert scales, open-ended survey questions, and independent review by two radiologists. Results59/120 patients were randomised to receive both the formal and LLM-simplified radiology reports. Readability of LLM-simplified reports significantly improved with the reading level required for formal reports equivalent to a university-standard (11th-13th grade) compared to a middle-school standard (5th-9th grade) for simplified reports (rank biserial correlation=0.83, p<0.001). Patients with both reports demonstrated a significantly greater comprehension level, with 95% reporting an understanding level greater than 50%, compared with 46% without the simplified report (rank biserial correlation = 0.67, p < 0.001). All LLM-simplified reports were considered at least somewhat accurate with a minimal hallucination rate of 1.7%. Importantly, no hallucinations resulted in potential patient harm. 118/120 (98.3%) patients expressed interest in simplified radiology reports to be included in future clinical practice. ConclusionThis study provides evidence that LLMs can simplify radiology reports to an accessible level of readability with minimal hallucination. LLMs improve both ease of readability and comprehension of radiology reports for patients. Therefore, the rapid advancement of LLMs shows strong potential in enhancing patient-radiologist communication as patient access to electronic health records is increasingly adopted. HighlightsO_LIRadiology reports can be complex and difficult for patients to read and interpret C_LIO_LIStrong patient demand exists for simplified radiology reports C_LIO_LILarge language models (LLMs) such as GPT-4o show promise in simplifying radiology reports C_LIO_LILLMs credibly simplify radiology reports with minimal hallucination rates C_LIO_LILLMs improve both patient readability and comprehension of radiology reports C_LI
Thomas, C.; Kim, J. Y.; Hasan, A.; Kpodzro, S.; Cortes, J.; Day, B.; Jensen, S.; LHuillier, S.; Oden, M. O.; Zumbado Segura, S.; Maurer, E. W.; Tucker, S.; Robinson, S.; Garcia, B.; Muramalla, E.; Lu, S.; Chawla, N.; Patel, M.; Balu, S.; Sendak, M.
Show abstract
Safety net healthcare delivery organizations (SNOs) serve vulnerable populations but face persistent challenges in adopting new technologies, including AI. While systematic barriers to technology adoption in SNOs are well documented, little is known about how AI is implemented in these settings. This study explored real-world AI adoption in SNOs, focusing on identifying barriers encountered across the AI lifecycle and strategies used to overcome them. Five SNOs in the U.S. participated in a 12-month technical assistance program, the Practice Network, to implement AI tools of their choosing. Observed barriers and mitigation strategies were documented throughout program activities and, at the conclusion of the program, reviewed and refined with participants using a participatory research approach to ensure findings reflected lived experiences and organizational contexts. Key barriers emerged during the Integration and Lifecycle Management phases and included gaps in AI performance evaluation and impact assessments, communication with patients about AI use, foundational AI education, financial resources for purchasing and maintaining AI tools, and AI governance structures. Effective strategies for addressing these barriers were primarily supported through centralized expertise, structured guidance, and peer learning. These findings provide granular, actionable insights for SNO leaders, offering guidance for anticipating barriers and proactively planning mitigation strategies. By including SNO perspectives, the study also contributes to the broader health AI ecosystem and underscores the importance of participatory, collaborative approaches to support safe, effective, and ethical AI adoption in resource-constrained settings. Author SummarySafety net organizations (SNOs) are healthcare systems that primarily serve low-income and underinsured patients. While interest in artificial intelligence (AI) in healthcare has grown rapidly, little is known about how these organizations experience AI adoption in practice. In this study, we partnered with five SNOs over a 12-month program to document the challenges they encountered when implementing AI tools and the strategies they used to address them. We worked closely with SNO staff throughout the process to ensure our findings reflected their lived experiences with AI implementation. We found that the most common challenges arose when organizations tried to integrate AI into daily operations and monitor and maintain those tools over time. Specific barriers included difficulty evaluating whether AI was performing as expected, limited guidance on communicating with patients about AI use, a lack of resources for staff training, limited financial resources, and the absence of formal governance structures. Successful strategies for overcoming these challenges drew on shared knowledge and structured support provided by the program, as well as learning from peer organizations. These findings offer practical guidance for SNO leaders planning or managing AI adoption, and contribute to a broader conversation about what is required to implement AI safely and effectively in healthcare settings that serve the most medically and socially vulnerable patients.
Shankar, R.; Goh, A.; Xu, Q.
Show abstract
BackgroundThe administrative burden of clinical documentation is a recognised contributor to clinician burnout and diminished care quality. Ambient artificial intelligence (AI) scribe technology, which uses large language models to passively record and summarise clinical encounters, has rapidly gained traction internationally. However, no published studies have examined clinician experiences with this technology in the Asia-Pacific region or within Singapores multilingual healthcare system. ObjectiveThis study explored clinician perspectives on ambient AI scribe technology at Alexandra Hospital, Singapore, focusing on perceived benefits, barriers, workflow integration, ethical considerations, and recommendations for sustained implementation. MethodsA qualitative descriptive study was conducted using semi-structured interviews with 28 clinicians across multiple specialties at Alexandra Hospital, National University Health System (NUHS). Participants were purposively sampled for diversity in role, specialty, and usage level. Interviews were analysed using reflexive thematic analysis guided by the RE-AIM/PRISM framework. The COREQ checklist was followed. ResultsFive themes emerged: (1) reclaiming presence in the clinical encounter, (2) navigating accuracy and trust in AI-generated documentation, (3) workflow disruption and adaptation, (4) privacy, consent, and ethical tensions within Singapores regulatory landscape, and (5) envisioning sustainable integration. Clinicians reported improved patient engagement and reduced cognitive burden. Persistent barriers included accuracy concerns, AI hallucinations, limited multilingual functionality, loss of documentation style, and uncertainties around compliance with the Personal Data Protection Act (PDPA). ConclusionsAmbient AI scribe technology holds promise for alleviating documentation burden in Singapores public healthcare system. Realising this potential requires attention to safety validation, multilingual capability, clinician training, and patient-centred consent aligned with local regulatory frameworks.
Nayyar, C.; Xu, H. H.; Bates, A. T.; Conati, C.; Hilbers, D.; Avery, J.; Raman, S.; Fayaz-Bakhsh, A.; Nunez, J.-J.
Show abstract
Background: Artificial intelligence (AI) has rapidly garnered interest in healthcare, with research showing promise to improve quality, efficiency, and outcomes. Cancer care's multidisciplinary nature and high coordination demands are well positioned to benefit from AI. While attitudes in the uptake of evidence and toward the implementation of AI in medicine has been explored generally, literature remains scarce with specific regards to AI in cancer care. This study sought to understand how perspectives of both patients and professionals are essential for guiding responsible, effective implementation of evidence-based (EB) AI in cancer care. Methods: We conducted a workshop at the 2024 British Columbia (BC) Cancer Summit (Vancouver, Canada). Discussions addressed three guiding questions: concerns, benefits, and priorities for AI in cancer care. Responses from 48 workshop participants (patients and families, AI/computer science/cancer researchers, clinicians and allied health professionals, information technology professionals, healthcare administrators) underwent structured conceptualization by concept mapping, leveraging multidimensional scaling and hierarchical cluster and subcluster analysis to produce visual and quantitative maps of stakeholder priorities. Results: A total of 265 statements on perceived benefits, concerns, and priorities related to the implementation of AI in cancer care were generated from the workshop and underwent concept mapping. Two clusters were identified; Cluster 1 focused on "Challenges and Safeguards for AI Implementation," and Cluster 2 focused on "Clinical Benefits and Efficiency Gains." Subcluster analysis distinguished 8 thematic subclusters (4 per cluster). Both mean importance (P < .001) and feasibility (P < .001) ratings were significantly higher for Cluster 2. No differences were found between ratings by clinical and nonclinical professionals. Further go-zone analysis classified statements according to their relative superiority/inferiority in importance and feasibility compared to the overall average. Conclusions: Stakeholder ratings were higher for statements describing clinical benefits and efficiency gains than for those describing challenges and safeguards for AI implementation in cancer care. Concept mapping analysis distinguished between workflow-aligned AI applications, perceived as ready for implementation, and system-level governance requirements requiring longer-term investment. Present findings provide a structured, stakeholder-informed framework for prioritizing and sequencing AI implementation efforts in cancer care, constituting a practical blueprint to catalyze meaningful progress.
Sun, M.; Reiter, E.; Murchie, P.; Kiltie, A. E.; Ramsay, G.; Duncan, L.; Adam, R.
Show abstract
ObjectiveMore people than ever before are living with cancer. Patient education is a core component of cancer care, and patients are increasingly using large language models (LLMs), such as ChatGPT, for advice. The objectives of this study were to evaluate the ability of ChatGPT to explain specialist cancer care records (multidisciplinary team (MDT) meeting reports) to patients and to understand key stakeholders views and opinions about the technology. MethodsSix simulated MDT meeting reports were created by cancer clinicians. MDT reports and 184 realistic patient-centred queries were input into ChatGPT4.0 web version. We conducted a mixed-methods study combining qualitative analysis with exploratory quantitative components to evaluate ChatGPTs responses. The study consisted of three stages: (1) Clinician sense-checking, (2) Clinical and non-clinical annotation, (3) focus groups (including cancer patients, caregivers, computer scientists, and clinicians). ResultsChatGPT was able to summarise complex oncology information into simpler language, to provide definitions of complex terms and to answer questions about clinical care. However, clinician sense-checking identified problems with accuracy, language and content. In clinician annotation, 92.6% of ChatGPTs responses were judged problematic. Across all evaluation methods, six recurring themes were identified: accuracy, language, trust, content, personalisation and integration challenges. Patients and clinicians found the summaries and definitions useful; however, the responses were not tailored to the individual patient or to what the report might mean for them. ConclusionThis study highlights current challenges in using LLMs to explain complex cancer diagnoses and treatment records, including inaccurate information, inappropriate language, limited personalisation, AI distrust and challenges in integrating LLMs into clinical workflow. Understanding of the limitations is crucial for clinicians, patients, computer scientists and policy makers. The issues should be addressed before deploying LLMs in clinical settings.
Jafarifiroozabadi, R.
Show abstract
Background: Safety is a critical concern in behavioral health crisis units (BHCUs), where environmental risks (e.g., ligature points) can lead to injury to self or others. However, limited research has examined how perceived safety influences facility selection among patients and care partners, or how these perceptions align with AI-driven safety risk assessments in such environments. Method: To address these gaps, a nationwide discrete choice online survey was conducted using image-based scenarios of BHCU environments, where participants selected preferred facilities based on a range of attributes, including environmental safety risks (e.g., ligature points). Additionally, participants identified safety risks in survey images, which were compared with outputs from an AI-driven tool developed and trained to detect environmental risks by experts. Quantitative analysis using conditional logit models examined the influence of attributes on facility choice, while spatial comparisons of annotated images and heatmaps assessed participant and AI-identified risk alignments. Results: Findings revealed that the higher frequency of safety risks in images significantly reduced the likelihood of facility selection (p < .001, OR {approx} 1.28), highlighting the importance of perceived safety in user decision-making. While there was notable alignment between heatmaps generated by participants and AI, key differences emerged, suggesting that participant safety perception was influenced by features not fully captured by AI, such as the type of materials or unknown, out-of-label safety risks in facility images. Conclusions: Despite these limitations, results highlighted the value of integrating AI-driven assistive tools for non-expert user safety risk assessment to support decision-making for safer BHCU environments.
Shlyakhta, T.
Show abstract
BackgroundLarge Language Models (LLMs) show promise for clinical decision support in Intensive Care Units (ICU), but their safety and reliability remain inadequately evaluated through dual testing of both memory-dependent and memory-independent safety mechanisms. ObjectiveTo comprehensively evaluate LLMs using two independent safety tests: context-dependent contraindication memory (penicillin allergy recall) and context-independent authority resistance (Extended Milgram Test), revealing whether these represent unified or dissociated safety mechanisms. MethodsTwenty-three LLMs underwent automated testing via 24-hour ICU simulation on consumer hardware (NVIDIA RTX 3060 12GB). A subset of 26 models completed an Extended Milgram Test with five escalating harmful command scenarios. Scoring assessed safety compliance, Milgram resistance, conflict detection, and performance. ResultsCritical findings revealed dissociation between abstract ethics and clinical memory. While 65% of models achieved perfect Milgram resistance (100%), only 8.7% (n=2) correctly refused penicillin with allergy mention. Eight models demonstrated 100% Milgram resistance yet failed allergy recall (r = -0.39, p = 0.23). Only Granite 3.1 8B achieved perfect performance on both tests. ConclusionsAbstract ethical reasoning (refusing harmful orders in principle) is independent from concrete clinical memory (tracking patient-specific risks). Safe medical AI requires both capabilities--rarely both present. Dual safety testing should become mandatory for medical AI certification. HighlightsO_LIOnly 8.7% of tested LLMs passed critical safety tests for medication prescribing C_LIO_LIFirst study demonstrating dissociation between abstract ethics and clinical memory (r = -0.39) C_LIO_LIEight models refused all harmful orders but forgot documented allergies C_LIO_LIGranite 3.1 8B only model achieving perfect performance on both safety tests C_LIO_LIDual safety testing framework proposed for medical AI certification C_LI
Ju, Z.; Xue, Y.; Rud, A.; Savatt, J. M.; Lerner-Ellis, J.; Rehm, H. L.; Joly, Y.; Uberoi, D.
Show abstract
BackgroundThe sharing of data generated through the course of clinical genetic and genomic testing without explicit patient consent is increasingly important for timely diagnosis and treatment. While many jurisdictions permit the sharing of identifiable data for direct patient care, institutional policies vary in how clearly they specify key elements. When do policies permit sharing of data without explicit consent? What data types may be shared, with whom, and under what safeguards? Greater clarity around these elements may support responsible data sharing while balancing timely care with transparency and appropriate protections. MethodsWe conducted a qualitative content analysis of data-sharing and privacy policies from 33 clinical genomic institutions across 17 jurisdictions. Using a predefined analytical framework, we assessed how policies document key governance elements relevant to sharing without explicit consent. Two independent reviewers extracted information about clinical contexts, data types, justifications, and protections, documenting areas of inconsistency across institutions. ResultsAlthough 70% of institutions described circumstances permitting data sharing without explicit consent, most policies did not clearly define the scope or governance of such sharing. Policies also rarely distinguished clinical from research or secondary use and inconsistently specified privacy and security safeguards. While sharing was commonly justified for clinical care (78.3%) or testing services (43.5%), recipient roles, access conditions, and onward-sharing expectations were often left undefined. ConclusionThis uneven documentation could make it difficult for clinical teams, laboratories, and institutional decision-makers to identify and justify key decisions about what is permitted and under what conditions. A guidance framework specifying core policy elements and corresponding protections could help institutions communicate their governance choices more clearly while supporting more comparable baseline practices for responsible data sharing across settings.
Nkosi-Mjadu, B. E.
Show abstract
BackgroundSouth Africas public healthcare system serves most of the population through approximately 3,900 primary healthcare clinics characterised by long waiting times and high volumes of repeat-prescription visits. No published pre-arrival digital triage system operates across all 11 official South African languages while aligning with the South African Triage Scale (SATS). This paper reports the design and preliminary safety validation of BIZUSIZO, a hybrid deterministic-AI WhatsApp triage system. MethodsBIZUSIZO delivers SATS-aligned triage via WhatsApp, combining AI-assisted free-text classification (Claude Haiku 4.5) with a Deterministic Clinical Safety Layer (DCSL) that overrides AI output for 53 clinical discriminator categories (14 RED, 19 ORANGE, 20 YELLOW) coded in all 11 official languages and independent of AI availability. A five-domain risk factor assessment can only upgrade triage level. One hundred and twenty clinical vignettes in patient language (English, isiZulu, isiXhosa, Afrikaans; 30 per language) were scored against a developer-assigned gold standard with independent blinded nurse review. A 121-vignette multilingual DCSL safety consistency check across all 11 languages and a 220-call post-hoc framing sensitivity evaluation (110 paired vignettes) were also conducted. ResultsUnder-triage was 3.3% (4/120; 95% CI: 0.9%-8.3%) with no RED under-triage; exact concordance was 80.0% (96/120) and quadratic weighted kappa 0.891 (95% CI: 0.827-0.932). One two-level under-triage was observed on a non-RED presentation (V072, isiXhosa burns vignette, ORANGEGREEN); one two-level over-triage was observed (V054, isiZulu deep laceration, YELLOWRED). In the framing sensitivity evaluation, AI-only classification achieved 50.9% RED invariance under adversarial framing; full-pipeline classification achieved 95.0% in four validated languages, with the DCSL rescuing 18 of 23 AI drift cases. ConclusionsA hybrid deterministic-AI triage system with DCSL-based emergency detection achieved zero RED under-triage and consistent RED detection across all 11 official languages. The 16.7% over-triage rate falls within published South African SATS ranges (13.1-49%). A single two-level under-triage event was observed on an isiXhosa burns vignette (ORANGEGREEN) and is discussed in Limitations. Findings are preliminary; prospective validation against independent nurse triage is the necessary next step.
Bladder, K. J. M.; Verburg, A. C.; Arts-Tenhagen, M.; Willemsen, R.; van den Broek, G. B.; Driessen, C. M. L.; Driessen, R. J. B.; Robberts, B.; Scheffer, A. R. T.; de Vries, A. P.; Frenzel, T.; Swillens, J. E. M.
Show abstract
BackgroundGenerative artificial intelligence (GenAI) in healthcare may reduce administrative burden and enhance quality of care. Large language models (LLMs) can generate draft responses to patient messages using electronic health record (EHR) data. This could mitigate increased workload related to high message volumes. While effectiveness and feasibility of these GenAI tools have been studied in the United States, evidence from non-English contexts is scarce, particularly regarding user experience. ObjectiveThis study evaluated the effectiveness, feasibility and barriers and facilitators of implementing Epics Augmented Response Technology (Art) GenAI tool (Epic Systems Corporation, Verona, WI, USA) in a Dutch academic healthcare setting among a broad range of end users. It explored healthcare professionals (HCP) usage metrics, expectations, and early user experiences. MethodsWe conducted a hybrid type 1 effectiveness-implementation design. HCPs of four clinical departments (dermatology, medical oncology, otorhinolaryngology, and pulmonology) participated in a six-month study. Effectiveness of Art was assessed using efficiency indicators from Epic (including all InBasket users in the hospital) and survey scales measuring well-being and clinical efficiency at three time points: PRE, POST-1 (1 month), and POST-2 (4 months). Feasibility of Art was evaluated through adoption indicators from Epic and survey scales on use and usability. Barriers and facilitators of Art implementation were collected through the survey and thematized using the NASSS framework (Nonadoption, Abandonment, Scale-up, Spread and Sustainability). Results237 unique HCPs generated a total of 8,410 drafts. Review and drafting times were similar for users with and without Art, indicating minimal differences. Perceived clinical efficiency declined significantly from PRE to POST-2, while well-being remained unchanged. Adoption was initially high but decreased over time, averaging 16.7% across departments. Usability and intention-to-use scores also declined significantly. Oualitative findings highlighted time savings, well-structured drafts, and patient-centered language as facilitators. Reported barriers included limited impact on time, low practical utility, content inaccuracies, and style misalignment. ConclusionsThis evaluation of a GenAI tool for patient-provider communication in a non-English academic hospital revealed mixed perceptions of effectiveness and feasibility. High initial expectations contrasted with limited perceived impact on time-savings, well-being and clinical efficiency, alongside declining adoption and usability. Barriers and facilitators revealed contrasting views. These findings underscore the need for a workflow for the handling of user feedback, guidance on clinical responsibilities, along with clear communication about the tools purpose and limitations to manage expectations. Additionally, establishing consensus on a set of quality indicators and their thresholds that indicate when a GenAI tool is sufficiently robust will be critical for responsible scaling of GenAI in clinical practice.
Ekram, T. T.
Show abstract
BackgroundLarge language models (LLMs) are increasingly deployed in medical contexts as patient-facing assistants, providing medication information, symptom triage, and health guidance. Understanding their robustness to adversarial inputs is critical for patient safety, as even a single safety failure can lead to adverse outcomes including severe harm or death. ObjectiveTo systematically evaluate the safety guardrails of state-of-the-art LLMs through adversarial red-teaming specifically designed for medical contexts. MethodsWe developed a comprehensive taxonomy of 8 adversarial attack categories targeting medical AI safety, encompassing 24 distinct sub-strategies. Using an LLM-based attack generator, we created 160 realistic adversarial prompts across categories including dangerous dosing, contraindication bypass, emergency misdirection, and multi-turn escalation. We tested multiple leading LLMs (Claude Sonnet 4.5, GPT-5.2, Gemini 2.5 Pro, Gemini 3 Flash) using both single-turn and multi-turn attack sequences. All models received identical, standard medical assistant system prompts. An automated evaluator (Claude Sonnet 4.5) pre-screened responses for harm potential (0-5 scale) and guardrail effectiveness, with physician review planned for high-risk responses (harm_level [≥] 3). ResultsOf 160 adversarial prompts evaluated against Claude Sonnet 4.5, 11 (6.9%) elicited responses meeting our threshold for clinically significant harm (harm level [≥] 3 on a 0-5 scale). The model exhibited full refusal behavior in 86.2% of cases. Authority Impersonation was the dominant attack vector (45.0% success rate),s with the "Educational Authority" sub-strategy (framing requests as medical student questions) achieving 83.3% success -- the highest of any sub-strategy. Multi-turn escalation attacks achieved 0% success (0/20). Six of eight attack categories yielded no successful attacks. Physician review of the 11 flagged high-harm cases is in progress. ConclusionsStandard medical assistant system prompts provide strong baseline protection against most adversarial attacks, but are substantially vulnerable to authority impersonation -- particularly claims of educational context. The primary failure mode is behavioral mode-switching: the model provides clinically accurate but safety-framed-inadequately responses when it perceives a professional audience, rather than providing factually incorrect information. This suggests that guardrail improvements should target context-conditioned behavior rather than factual accuracy alone. Our open-source taxonomy and evaluation pipeline enable ongoing adversarial assessment as medical AI systems evolve. ImpactThis work provides the first systematic taxonomy and evaluation framework for medical AI adversarial testing, enabling developers to identify and remediate safety gaps before deployment. Our open-source attack taxonomy and methodology can serve as a foundation for ongoing red-teaming efforts as medical AI systems continue to evolve.
Zafar, W.; Tavares, S.; Hu, Y.; Brubaker, L.; Green, J.; Mehta, S.; Grams, M. E.; Chang, A. R.
Show abstract
BackgroundAlbuminuria is associated with increased risk of cardiovascular disease (CVD), heart failure, and progression of chronic kidney disease (CKD). Early detection of albuminuria, done through spot urine albumin creatinine ratio (UACR) testing, enables more accurate risk stratification and timely use of preventative therapies. It remains unacceptably low in the hypertension population. MethodsWe evaluated two EHR-embedded clinical decision support (CDS) strategies at Geisinger Health System in order to increase UACR testing in individuals with hypertension: an OurPractice Advisory (OPA) from Jan 2022 to Aug 2022; and a Health Maintenance Topic (HMT) in the Care Gaps section of Storyboard from Aug 2022 that continues to date. We evaluated UACR rates from 2020 to 2023 in Geisinger primary care and compared to a control group of healthcare systems in the Optum Labs Data Warehouse [OLDW]. Patients were excluded if they had UACR testing in the preceding 3 years, had diabetes or CKD, or were receiving palliative/hospice care. ResultsWe included 58,876 individuals in Geisinger (mean age 59.4 years, 49.6% female) and 1,427,754 in OLDW (61.0 years, 49% female). UACR testing in Geisinger (2.97% in 2020; 2.8% in 2021; 9.7% in 2022; 17.5% in 2023) showed significant increase compared to the control health systems (2.08%, 2.26%, 3.35% and 3.40% respectively). Results were consistent after adjusting for age, sex and race. ConclusionOPA increased UACR testing [~]3-fold whereas the HMT was associated with further improvements ([~]6-fold vs. baseline) among those with hypertension, suggesting an important role for CDS design in closing care gaps.
Vasquez-Venegas, C.; Chewcharat, A.; Kimera, R.; Kurtzman, N.; Leite, M.; Woite, N. L.; Muppidi, I. J.; Muppidi, R. J.; Liu, X.; Ong, E. P.; Pal, R.; Myers, C.; Salzman, S.; Patscheider, J. S.; John, T. R.; Rogers, M.; Samuel, M.; Santana-Guerrero, J. L.; Yaacob, S.; Gameiro, R. R.; Celi, L. A.
Show abstract
Computer vision models for chest X-ray interpretation hold significant promise for global healthcare, but their clinical value depends on equitable development across diverse populations. We conducted a scientometric analysis to examine authorship patterns, geographic distribution, and dataset origins to assess potential disparities that could affect clinical applicability. We systematically reviewed literature on computer vision applications for chest X-rays published between 2017-2025 across multiple databases, including PubMed, Embase and SciELO databases. Using Dimensions API and manual extraction, we analyzed 928 eligible studies, examining first and senior author affiliations, institutional contributions, dataset provenance, and collaboration patterns across different income classifications based on World Bank categories. High-income countries dominated research leadership, representing 55.6% of first authors and 59.7% of senior authors; no first authors were affiliated with low-income countries. China (16.93%) and the United States (16.72%) led in first authorship positions. Most datasets (73.6%) originated from high-income settings, with the United States being the largest contributor (40.45%). Private datasets were most frequently used (20.52%). Cross-income collaborations were rare, with only 3.9% of publications involving partnerships between high-income and lower-middle-income countries. Findings reveal substantial disparities in who shapes computer vision research on chest X-rays and which populations are represented in training data. These imbalances risk developing AI systems that perform inconsistently across diverse healthcare settings, potentially exacerbating healthcare inequities. Addressing these disparities requires coordinated efforts to develop globally representative datasets, establish equitable international collaborations, and implement policies that promote inclusive research practices.
Calderon, P. F.; Wolosker, N.
Show abstract
Objective: Develop a methodology to implement action plans that mitigate the negative impacts associated with the EHR implementation project and evaluate their effectiveness in reducing these issues. Methods: The research involved the development of mitigation plans for the potential negative impacts of implementing an electronic health record system, ensuring their execution and subsequently analyzing the effectiveness of the method. Results: Findings confirmed that 19.3% of 264 identified impacts were resolved through 52 plans before Go Live. During Go Live, the remaining 213 impacts were addressed through 337 plans. Six months later, 190 impacts were confirmed, and the plans were considered effective or partially effective in 80.5% of cases. Conclusions: Effective governance, a multidisciplinary methodology, and well-planned and executed actions increase the likelihood of success for health technology projects.
Al-Dabbas, Z.; Khandakji, L.; Al-Shatarat, N.; Alqaisiah, H.; Ibrahim, Y.; Awed, T.; Baik, H.; Dawoud, M.; Ali, R. A.-H.; Telfah, Z.; Al-Hmaid, Y.; Alsharkawi, A.
Show abstract
Artificial intelligence (AI) is increasingly integrated into healthcare delivery, yet patient acceptance in resource constrained settings remains incompletely characterized. This study assessed attitudes toward AI supported care among patients attending hospitals in three Jordanian governorates (Amman, Balqa, Irbid) and examined demographic and digital literacy correlates of acceptance. In a cross sectional survey (n = 500 complete questionnaires), participants rated exposure to AI in healthcare and five attitudinal domains, namely perceived usefulness or performance expectancy, trust and transparency, privacy and perceived risks, empathy and human interaction, and readiness or behavioral intention, using 25 items on 5 point Likert scales. Patients expressed conditional optimism: empathy and human interaction was most strongly endorsed (M = 4.33, SD = 0.58), alongside relatively high perceived usefulness (M = 3.97, SD = 0.68), while trust and transparency (M = 3.57, SD = 0.74) and readiness (M = 3.66, SD = 0.90) were moderate to high; privacy and risk concerns were moderate (M = 3.51, SD = 0.77) and self reported exposure was lowest (M = 2.57, SD = 1.07). The highest agreement item indicated preference for AI to work alongside physicians rather than be relied on alone (M = 4.47, SD = 0.81). Trust and transparency and perceived usefulness were positively associated with readiness (r = 0.48 and r = 0.44, respectively; p <.001), while privacy and perceived risks were negatively correlated with trust and usefulness. In multivariable regression adjusting for gender, age group, education, prior AI health app or device use, and self rated digital skill, lower educational attainment (less than high school and high school) predicted reduced readiness, whereas higher digital skill predicted increased readiness (R2 = 0.101). These findings suggest that implementation strategies in Jordan should emphasize human involvement alongside AI, transparent communication and governance, and interventions that build digital confidence and reduce readiness gaps linked to education. Author summaryAI is increasingly used in healthcare, for example to support diagnosis, triage, and treatment decisions. Whether these tools are accepted by patients depends not only on how well they work, but also on whether patients trust them, understand how they are used, and feel their privacy is protected. Evidence on patient views in middle income and resource constrained settings is still limited. We surveyed 500 patients attending hospitals in three Jordanian governorates to understand how they view AI supported care. Patients generally expected AI to be useful, but they strongly preferred that clinicians remain actively involved and that AI supports rather than replaces physicians. Trust and perceived usefulness were closely linked to willingness to accept AI enabled care, while privacy concerns were present and shaped trust. Readiness to accept AI was lower among participants with lower educational attainment and higher among those with greater self rated digital skill. These findings suggest that successful implementation in Jordan should prioritize transparent communication, strong privacy safeguards, and human centered workflows, while also strengthening digital confidence to avoid widening gaps in acceptance.
Ng, J. Y.; Bhavsar, D.; Krishnamurthy, M.; Dhanvanthry, N.; Fry, D.; Kim, J. W.; King, A.; Lai, J.; Makwanda, A.; Olugbemiro, P.; Patel, J.; Virani, I.; Ying, E.; Yong, K.; Zaidi, A.; Zouhair, J.; Lee, M. S.; Lee, Y.-S.; Nesari, T. M.; Ostermann, T.; Witt, C. M.; Zhong, L.; Cramer, H.
Show abstract
BackgroundArtificial intelligence chatbots (AICs) are increasingly being integrated into scholarly publishing, with the potential to automate routine editorial tasks and streamline workflows. In traditional, complementary, and integrative medicine (TCIM) publishing, editorial and peer review processes can be particularly complex due to diverse methodologies and culturally embedded knowledge systems, presenting unique opportunities and challenges for AIC adoption. MethodsAn anonymous, online cross-sectional survey was distributed to the editorial board members of 115 TCIM journals. The survey assessed familiarity and current use of AICs, perceived benefits and challenges, ethical concerns, and anticipated future roles in editorial workflows. ResultsOf 5119 invitations, 217 eligible participants completed the survey. While approximately 70% of respondents reported familiarity with AI tools, over 60% had never used AICs for editorial tasks. Editors expressed strongest support for text-focused applications, such as grammar and language checks (81.0%) and plagiarism/ethical screening (67.4%). Most respondents (82.8%) believed that AICs would be important or very important to the future of scholarly publishing; however, the majority (65.3%) reported that their journals lacked AI-specific policies and training programs to guide editors and peer reviewers. ConclusionsMost TCIM editors believe that AICs have potential to support routine editorial functions but also have limited adoption into editorial and peer review processes due to practical, ethical, and institutional barriers. Additional training and guidance are warranted by journals to direct responsible and ethical use if AICs are to be adopted in TCIM academic publishing.